The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers

نویسندگان

  • Nigel Collier
  • Hyun Seok Park
  • Norihiro Ogata
  • Yuka Tateishi
  • Chikashi Nobata
  • Tomoko Ohta
  • Tateshi Sekimizu
  • Hisao Imai
  • Katsutoshi Ibushi
  • Jun'ichi Tsujii
چکیده

We present an outline of the genome information acquisition (GENIA) project for automatically extracting biochemical information from journal papers and abstracts. GENIA will be available over the Internet and is designed to aid in information extraction, retrieval and visualisation and to help reduce information overload on researchers. The vast repository of papers available online in databases such as MEDLINE is a natural environment in which to develop language engineering methods and tools and is an opportunity to show how language engineering can play a key role on the Internet.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The GENIA Project: Knowledge Acquisition from Biology Texts

Overview of Project The GENIA project [9] (Fig. 1) seeks to automatically extract useful information from texts written by scientists to help overcome the problems caused by information overload. We intend that while the methods are customized for application in the microbiology domain, the basic methods should be generalisable to knowledge acquisition in other scientific and engineering domain...

متن کامل

Steps towards a GENIA Dependency Treebank

In this paper we describe on-going work aimed at creating a dependency-based annotated treebank for the BioMedical domain. Our starting point is the GENIA corpus [14], which is a corpus of 2000 MEDLINE abstracts, which has been manually annotated for various biological entities, according to the GENIA Ontology.1 There is an exponential growth of published research in this sector, which makes it...

متن کامل

The GENIA Corpus: an Annotated Research Abstract Corpus in Molecular Biology Domain

With the information overload in genome-related field, there is an infreest need for natural language processing technology to extract information from literature and various attempts of information extraction using NLP has been being made. We are developing the necessary resources including domain ontology and annotated corpus from research abstracts in MEDLINE database (GENIA corpus). We are ...

متن کامل

Use of OWL 2 to Facilitate a Biomedical Knowledge Base Extracted from the GENIA Corpus

The annotation of the GENIA corpus, a set of biomedical articles, targets the classification of biological entities based on their association with a domain-tailored taxonomy of categories. By incorporating information extraction process on the corpus we have developed a knowledge base (KB) that includes a more comprehensive taxonomy of categories, relationships between biological entities, and...

متن کامل

Meta-Knowledge Annotation at the Event Level: Comparison between Abstracts and Full Papers

Biomedical literature contains rich information about events of biological relevance. Event corpora, containing classified, structured representations of important facts and findings contained within text, provide an important resource for the training of domain-specific information extraction (IE) systems. Such corpora pay little attention to the interpretation of events, e.g., whether an even...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999